π§ What is Moran's I?
Moran's I is the most widely used measure of spatial autocorrelation β it tells us whether similar values tend to cluster together in space, or whether they are randomly or negatively arranged.
$$I = \dfrac{N}{W} \cdot \dfrac{\displaystyle\sum_{i}\sum_{j} w_{ij}\,(x_i - \bar{x})(x_j - \bar{x})}{\displaystyle\sum_{i}(x_i - \bar{x})^2}$$
NNumber of spatial units (regions)
WSum of all spatial weights wα΅’β±Ό
wSpatial weight matrix β encodes "who is neighbor to whom"
xVariable of interest (e.g., unemployment rate)
Interpretation
I β +1 β Strong positive autocorrelation (clusters)
I β 0 β Spatial randomness
I β β1 β Dispersed / checkerboard pattern
Expected value under randomness: E[I] = β1/(Nβ1) β 0 for large N
πΈπͺ The Data: Swedish NUTS-2 Regions
Sweden is divided into 8 NUTS-2 regions. We'll use simulated unemployment rates (%) that mirror real northβsouth patterns, where the more urbanized south tends to have lower unemployment.
Hover over regions to explore
π Step 1: Build the Spatial Weights Matrix
The weights matrix W defines which regions are neighbors. The simplest approach is queen contiguity (shared border or corner). In PySAL, we use libpysal:
import geopandas as gpd
import libpysal
from esda.moran import Moran
import splot.esda as splot
import matplotlib.pyplot as plt
# Load Swedish NUTS-2 shapefile (or GeoPackage)
sweden = gpd.read_file("sweden_nuts2.gpkg")
# Create Queen contiguity weights
w = libpysal.weights.Queen.from_dataframe(sweden)
w.transform = 'r' # row-standardize: each row sums to 1
# Inspect
print(w.n) # 8 regions
print(w.mean_neighbors) # average neighbors per region
Row-standardization (transform='r') means each weight wα΅’β±Ό = 1/nα΅’ where nα΅’ is the number of neighbors of i. This makes the spatial lag a proper local average of neighbors.
Weights Matrix (Queen Contiguity)
π Step 2: Compute Moran's I
# Compute Global Moran's I
mi = Moran(sweden['unemployment'], w)
print(f"Moran's I = {mi.I:.4f}")
print(f"Expected I = {mi.EI:.4f}")
print(f"p-value = {mi.p_sim:.4f}") # based on 999 permutations
print(f"z-score = {mi.z_sim:.4f}")
0.012
p-value (999 perms)
Interpretation: I = 0.612 is substantially above the expected value (β 0). The p-value of 0.012 means there is only a 1.2% chance of observing this level of clustering by random chance. Unemployment rates in Sweden are significantly clustered in space. High-unemployment regions border other high-unemployment regions (mainly in the north), and low-unemployment regions cluster in the south.
π Step 3: The Moran Scatter Plot
The Moran scatter plot shows each region's standardized value (z) on the X axis vs. its spatial lag (average of neighbors' z-scores) on the Y axis. The slope of the regression line is Moran's I.
# Plot Moran scatter plot using splot
fig, ax = splot.esda.moran_scatterplot(mi, aspect_equal=True)
plt.title("Moran Scatter Plot β Swedish Unemployment")
plt.show()
Quadrant Meanings
Q1 β High-High (HH): Region has high unemployment AND its neighbors do too. Spatial cluster of disadvantage.
Q2 β Low-High (LH): Region is low but surrounded by high. Spatial outlier.
Q3 β Low-Low (LL): Region and its neighbors all have low unemployment. Cluster of prosperity.
Q4 β High-Low (HL): High region surrounded by low. Spatial outlier.
The slope of the red line = Moran's I β 0.612
π Step 4: Local Moran's I (LISA)
Global Moran's I gives one number for the whole country. Local Indicators of Spatial Association (LISA) decompose it to identify where clusters are.
from esda.moran import Moran_Local
# Compute Local Moran's I
lm = Moran_Local(sweden['unemployment'], w)
# lm.q tells us the quadrant: 1=HH, 2=LH, 3=LL, 4=HL
# lm.p_sim gives the pseudo p-value per region
sweden['lisa_q'] = lm.q
sweden['lisa_p'] = lm.p_sim
sweden['sig'] = lm.p_sim < 0.05
# Plot significance map
splot.esda.lisa_cluster(lm, sweden, p=0.05)
plt.show()
Result for Sweden: Northern regions (Norra Mellansverige, Γvre Norrland) appear as HH clusters β significantly high unemployment surrounded by high unemployment. The Stockholm region appears as part of an LL cluster β low unemployment in a prosperous neighborhood.
π Complete PySAL Workflow
import geopandas as gpd
import libpysal
from esda.moran import Moran, Moran_Local
import splot.esda as splot
import matplotlib.pyplot as plt
import numpy as np
# ββ 1. Load data βββββββββββββββββββββββββββββββββββββββββββββ
sweden = gpd.read_file("sweden_nuts2.gpkg")
y = sweden["unemployment"]
# ββ 2. Weights ββββββββββββββββββββββββββββββββββββββββββββββββ
w = libpysal.weights.Queen.from_dataframe(sweden)
w.transform = "r"
# ββ 3. Global Moran's I βββββββββββββββββββββββββββββββββββββββ
mi = Moran(y, w)
print(f"I={mi.I:.3f}, p={mi.p_sim:.3f}")
fig, axs = plt.subplots(1, 2, figsize=(12, 5))
# Choropleth map
sweden.plot(column="unemployment", ax=axs[0],
cmap="YlOrRd", legend=True)
axs[0].set_title("Unemployment Rate (%)")
# Moran scatter plot
splot.esda.moran_scatterplot(mi, ax=axs[1])
axs[1].set_title(f"Moran Scatter (I={mi.I:.3f})")
plt.tight_layout()
plt.show()
# ββ 4. Local Moran's I (LISA) ββββββββββββββββββββββββββββββββ
lm = Moran_Local(y, w, seed=42)
fig2, ax2 = plt.subplots(figsize=(6, 8))
splot.esda.lisa_cluster(lm, sweden, p=0.05, ax=ax2)
ax2.set_title("LISA Cluster Map β Sweden")
plt.show()
β
Key Takeaways
1Spatial weights encode the neighborhood structure β always row-standardize for Moran's I
2Global Moran's I gives a single summary statistic for the whole study area
3Significance is assessed by permutation (999 random shuffles), not parametric tables
4Local Moran's I (LISA) identifies specific cluster locations and types (HH, LL, HL, LH)
Sweden's unemployment story: The northβsouth gradient in Swedish unemployment isn't random β it's spatially structured. Moran's I β 0.61 with p < 0.05 tells us this pattern is unlikely to have arisen by chance.
Libraries used: libpysal, esda, splot, geopandas, matplotlib. Install with: pip install pysal splot geopandas